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Abstract 

We  address  the  problem  of  estimating  the  effect  of  in¬ 
tervening  on  a  set  of  variables  X  from  experiments  on 
a  different  set,  Z ,  that  is  more  accessible  to  manipu¬ 
lation.  This  problem,  which  we  call  z-identifiability, 
reduces  to  ordinary  identifiability  when  Z  =  0  and, 
like  the  latter,  can  be  given  syntactic  characterization 
using  the  do-calculus  [Pearl,  1995;  2000].  We  provide 
a  graphical  necessary  and  sufficient  condition  for  z- 
identifiability  for  arbitrary  sets  X ,  Z.  and  Y  (the  out¬ 
comes).  We  further  develop  a  complete  algorithm  for 
computing  the  causal  effect  of  X  on  Y  using  informa¬ 
tion  provided  by  experiments  on  Z .  Finally,  we  use 
our  results  to  prove  completeness  of  do-calculus  rela¬ 
tive  to  z-identifiability,  a  result  that  does  not  follow 
from  completeness  relative  to  ordinary  identifiability. 

1  Introduction 

The  relation  between  passive  and  experimental  obser¬ 
vations,  and  how  they  can  aid  the  estimation  of  causal 
effects,  is  of  central  interest  in  the  empirical  sciences. 

In  this  line  of  research,  the  identification  problem  (I'D, 
for  short)  asks  whether  causal  effects  can  be  computed 
from  the  joint  distribution  P  over  the  observed  vari¬ 
ables,  and  theoretical  knowledge  encoded  in  the  form 
of  a  causal  diagram  G. 

This  problem  has  been  extensively  studied  in  the  lit¬ 
erature,  and  [Pearl,  1995;  2000]  gave  it  rigorous  math¬ 
ematical  treatment  based  on  the  structural  semantics, 
and  introduced  several  graphical  conditions  such  as  the 
“back-door”  and  “front-door”  criteria,  which  was  later 
generalized  by  his  do-calculus.  In  the  last  decades,  a 
number  of  conditions  had  emerged  for  non-parametric 
identifiability  such  as  the  ones  given  by  [Spirtes,  Gly- 
mour,  and  Schemes,  1993;  Galles  and  Pearl,  1995; 
Pearl  and  Robins,  1995;  Halpern,  1998;  Kuroki  and 
Miyakawa,  1999].  In  a  series  of  breakthrough  results 
starting  with  the  development  of  the  concept  of  C- 


component  [Tian  and  Pearl,  2002],  the  do-calculus  was 
finally  shown  to  be  complete  [Huang  and  Valtorta, 
2006;  Shpitser  and  Pearl,  2006].  This  result  implies 
that  there  exists  a  finite  sequence  of  applications  of 
the  rules  of  do-calculus  that  derives  the  target  causal 
effect  Q  in  terms  of  the  observational  distribution  P 
if  (and  only  if)  Q  is  identifiable.  The  same  work  also 
provided  algorithms  that  return  a  mapping  from  P  to 
Q  whenever  Q  is  identifiable. 

In  real  world  applications,  it  is  not  uncommon  that 
the  quantity  Q  is  unidentifiable,  i.e.,  the  distribution 
P  together  with  the  graph  G  are  not  able  to  unambigu¬ 
ously  determine  Q.  A  natural  question  arises  whether 
the  investigator  could  perform  some  auxiliary  experi¬ 
ments  (not  necessary  spelled  out  in  Q ),  which  would 
enable  him/her  to  estimate  the  desired  causal  effects. 

For  instance,  consider  the  causal  diagram  G  in  Fig. 
1(a).  Suppose  one  is  interested  in  assessing  the  ef¬ 
fect  Q  of  cholesterol  levels  (X)  on  heart  disease  (U), 
and  data  about  subjects’  diet  (Z)  is  also  collected. 
It  is  clear  that  Q  is  unidentifiable  from  the  assump¬ 
tions  embodied  in  G,  but  it  is  infeasible  in  reality  to 
control  subjects’  cholesterol  level  by  intervention.  As¬ 
sume  that  an  experiment  can  be  conducted  in  which 
the  subjects’  diet  (Z)  is  randomized;  a  natural  ques¬ 
tion  emerges  whether  Q  is  computable  given  this  ad¬ 
ditional  piece  of  experimental  information? 

Surprisingly,  this  ubiquitous  problem  has  not  received 
a  thorough  formal  treatment.  We  introduce  a  variation 
of  the  I'D  problem  to  fill  in  this  gap.  Consider  a  set¬ 
ting  in  which,  in  addition  to  the  information  available 
in  an  ordinary  ID  instance  (distribution  P  and  graph 
G),  further  experiments  can  be  performed  over  a  set 
of  variables  Z ;  decide  whether  the  target  causal  ef¬ 
fects  can  be  computed  from  the  available  information 
at  hand.  This  extension  generalizes  the  ID  problem 
(when  Z  =  0  the  two  problems  coincide)  and  is  called 
here  the  ^-identification  problem  ( zID ,  for  short).  The 
Z  is  called  surrogate  experiments,  for  obvious  reasons. 
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Syntactically,  the  zICD  problem  amounts  to  trans¬ 
forming  P(y lx)1  into  an  equivalent  expressions  in  do- 
calculus  such  that  only  members  of  Z  may  contain  the 
hat  symbol.  Applying  this  rationale  for  the  example 
given  above  (Fig.  1(a))  entails  the  following  reduction 
in  the  do-calculus.  First  apply  Rule  3  to  add  z, 

P(y\x)  =  P(y\x,z )  since  (Y  _LL  Z\X)G ^ 

Then  apply  Rule  2  to  exchange  x  with  x: 


P{y\x ,  z)  =  P{y\x,  z)  since  (Y  _LL  X\ Z)G^ 
This  last  expression  can  be  rewritten  as, 


P{y\x,z) 


P{y,x\z) 
P(x\z ) 


(1) 


This  expression  shows  that  performing  an  experiment 
on  Z  suffices  to  yield  “identifiability”  of  the  causal 
effect  of  X  on  Y  without  experimenting  over  X.  2 

The  subtlety  of  this  problem  can  be  illustrated  by  not¬ 
ing  that  in  the  graph  in  Fig.  1(a)  the  effect  is  z- 
identifiable  from  P(V)  and  P(X,  Y\Z)  in  G,  whereas 
in  the  graph  in  Fig.  1(b)  it  is  not  (to  be  shown  later). 
The  only  difference  between  these  two  graphs  is  the 
bidirected  edge  between  the  pairs  (X,Z)  and  (X,  Y). 

One  might  surmise  that  zRD  can  be  represented  by 
a  mutilated  graph  in  which  the  edges  incoming  to 
Z  are  cut,  and  the  problem  would  then  be  solved 
as  ordinary  identifiability.  Unfortunately,  this  is  not 
the  case  as  shown  in  the  graph  in  Fig.  1(c)  where 
Q  =  P(y\x).  The  option  of  manipulating  Z  does  not 
enable  us  to  compute  the  .Z-specific  causal  effect  of  X 
on  Y,  P(y\x,z)  which  ,  if  available,  would  allow  us  to 
compute  the  overall  causal  effect  by  averaging  over  Z. 
Although  Q'  =  P(y\x,z )  can  be  established  from  the 
mutilated  graph,  it  does  not  help  in  establishing  the 
Z-specific  causal  effect,  or  Q. 

The  first  formal  treatment  of  this  problem  [Pearl,  1995] 
led  to  the  following  sufficient  condition  for  admitting 
a  surrogate  variable  Z  for  the  causal  effect  P{y\x): 


(i)  X  intercepts  all  directed  paths  from  Z  to  Y,  and 

(ii)  P{y\x)  is  identifiable  in  G^. 

These  conditions  are  satisfied  indeed  in  the  model  of 
Fig.  1(a)  but  not  in  1(b)  or  1(c).  Pearl’s  criterion  is 
sufficient  but  was  not  shown  to  be  necessary.  Addi¬ 
tionally,  it  was  not  extended  to  the  case  where  Z  and 
X  are  sets  of  variables.  At  the  same  time,  the  syntac¬ 
tic  condition  above,  which  requires  the  existence  of  a 

1We  will  use  P(y\x)  interchangeably  with  Px{y )  or 
P(y\do(x)).  We  also  will  call  the  interventional  operator 
do  ()  as  the  “hat”  operator. 


Y  Y  Y 

(a)  (b)  (c) 


Figure  1:  Causal  diagrams  illustrating  z-identifiability 
of  the  causal  effect  Q  =  P(y\x).  Q  can  be  identified  by 
experiments  on  Z  in  model  (a),  but  not  in  (b)  and(c). 


do-calculus  transformation  expression  containing  only 
do(z)  terms  is  declarative,  but  is  not  computationally 
effective,  since  it  does  not  specify  the  sequence  of  rules 
leading  to  the  needed  transformation,  nor  does  it  tell 
us  if  such  a  sequence  exists.  Even  though  clo-calculus 
is  complete  for  identifying  causal  effects,  it  is  not  im¬ 
mediately  clear  whether  it  is  complete  for  zIlD. 

This  paper  provides  a  systematic  study  of  2- 
identifiability  building  on  Pearl’s  condition  and  the 
previous  results  from  the  identifiability  literature;  our 
contributions  are  as  follows: 

•  We  provide  a  necessary  and  sufficient  graphical 
condition  for  the  problem  of  ^-identification  when 
Z  is  a  set  of  variables. 

•  We  then  construct  a  complete  algorithm  for  de¬ 
ciding  ^-identification  of  joint  causal  effects  and 
returning  the  correct  formula  whenever  those  ef¬ 
fects  are  ^-identifiable. 

•  We  further  show  that  do-calculus  is  complete  for 
the  task  of  ^-identification. 

2  Notation  and  Definitions 

The  basic  semantical  framework  in  our  analysis  rests 
on  probabilistic  causal  models  as  defined  in  [Pearl, 
2000,  pp.  205],  which  are  also  called  structural  causal 
models  or  clata-generating  models.  In  the  structural 
causal  framework  [Pearl,  2000,  Ch.  7],  actions  are  mod¬ 
ifications  of  functional  relationships,  and  each  action 
do(X  =  x)  on  a  causal  model  M  produces  a  new  model 

2The  expression  also  shows  that  only  one  level  of  Z  suf¬ 
fices  for  the  identification  of  P(y\x)  for  any  value  of  y  and 
x.  In  other  words,  Z  need  not  be  varied  at  all;  it  can  simply 
be  held  constant  by  external  means  and,  if  the  assumptions 
embodied  in  G  are  valid,  the  r.h.s.  of  eq.  (1)  should  attain 
the  same  value  regardless  of  the  (constant)  level  at  which 
Z  is  being  held  constant.  In  practice,  however,  several  lev¬ 
els  of  Z  will  be  needed  to  ensure  that  enough  samples  are 
obtained  for  each  desired  value  of  A'. 


Mx  =  (U,  V,  Fx,  P(U)),  where  Px  is  obtained  after  re¬ 
placing  fx  €  F  for  every  X  €  X  with  a  new  function 
that  outputs  a  constant  value  x  given  by  do(X  =  x). 

We  follow  the  conventions  given  in  [Pearl,  2000].  We 
will  denote  variables  by  capital  letters  and  their  val¬ 
ues  by  small  letters.  Similarly,  sets  of  variables  will  be 
denoted  by  bold  capital  letters,  sets  of  values  by  bold 
letters.  We  will  use  the  typical  graph-theoretic  termi¬ 
nology  with  the  corresponding  abbreviations  Po(Y)q, 
A?r(Y),3,  and  De( Y)g,  which  will  denote  respectively 
the  set  of  observable  parents,  ancestors,  and  descen¬ 
dants  of  the  node  set  Y  in  G.  By  convention,  these  sets 
will  include  the  arguments  as  well,  for  instance,  the 
ancestral  set  Au(Y)q  will  include  Y.  We  will  usually 
omit  the  graph  subscript  whenever  the  graph  in  ques¬ 
tion  is  assumed  or  obvious.  A  graph  Gy  will  denote 
the  induced  subgraph  G  containing  nodes  in  Y  and  all 
arrows  between  such  nodes.  Finally,  G^z  stands  for 
the  edge  subgraph  of  G  where  all  incoming  arrows  into 
X  and  all  outgoing  arrows  from  Z  are  removed. 

We  build  on  the  problem  of  identifiability,  defined  be¬ 
low,  which  expresses  the  requirement  that  causal  ef¬ 
fects  must  be  computable  from  a  combination  of  pas¬ 
sive  data  P  and  the  assumptions  embodied  in  a  causal 
graph  G  ( without  assuming  any  availability  of  addi¬ 
tional  experimental  information). 

Definition  1  (Causal  Effects  Identifiability  (Pearl)). 
Let  X,  Y  be  two  sets  of  disjoint  variables,  and  let  G 
be  the  causal  diagram.  The  causal  effect  of  an  action 
dofX.  =  x)  on  a  set  of  variables  Y  is  said  to  be  iden¬ 
tifiable  from  P  in  G  if  Px( y)  is  (uniquely)  computable 
from  P(V)  in  any  model  that  induces  G. 

The  following  Lemma  is  the  operational  way  to  prove 
that  a  causal  quantity  is  not  identifiable  given  the  as¬ 
sumptions  embedded  in  G. 

Lemma  1.  Let  X,  Y  be  two  sets  of  disjoint  variables, 
and  let  G  be  the  causal  diagram.  Px( y)  is  not  iden¬ 
tifiable  in  G  if  there  exist  two  causal  models  M 1  and 
M2  compatible  with  G  such  that  -Pi(V)  =  P2(V),  and 
Pi(y|do(x))  ^  P2(y|do(x)). 

Proof.  The  latter  inequality  rules  out  the  existence  of 
a  function  from  P  to  Px(y).  □ 

Next,  we  formally  introduce  the  problem  of  z- 
identifiability  that  generalizes  the  problem  of  identifia¬ 
bility  whereas  it  is  no  longer  assumed  that  experimen¬ 
tal  information  is  not  available  at  all,  but  there  exists  a 
set  of  variable  Z  in  which  experiments  were  performed 
and  now  is  available  for  use.  In  other  words,  the  ex¬ 
plicit  acknowledgement  of  the  existence  of  the  set  Z 
adds  a  degree  of  freedom  for  the  researcher,  making 
the  analysis  more  flexible  and  perhaps  realistic. 


Definition  2  (Causal  Effects  ^-Identifiability) .  Let 
X,Y,Z  be  disjoint  sets  of  variables,  and  let  G  be 
the  causal  diagram.  The  causal  effect  of  an  action 
dofX.  =  x)  on  a  set  of  variables  Y  is  said  to  be  z- 
identifiable  from  P  in  G,  if  Px( y)  is  (uniquely)  com¬ 
putable  from  P(V)  together  with  the  interventional 
distributions  P(V  \  Z'|do(Z')),  for  all  Z'  C  Z,  in  any 
model  that  induces  G. 

Armed  with  this  new  definition,  we  state  next  the  suf¬ 
ficiency  of  the  do-calculus  for  zIP  that  is  analogous  to 
[Pearl,  2000,  Corol.  3.4.2]  in  respect  to  identification. 
Theorem  1.  Let  X,  Y,  Z  be  disjoint  sets  of  variables, 
let  G  be  the  causal  diagram,  and  Q  =  P(y|do(x)).  Q 
is  zIP  from  P  in  G  if  the  expression  P(y|do(x))  is  re¬ 
ducible,  using  the  rules  of  do -calculus,  to  an  expression 
in  which  only  elements  of  Z  may  appear  as  interven¬ 
tional  variables. 

Proof.  The  result  follows  from  soundness  of  do- 
calculus  and  the  definition  of  ^-identifiability.  □ 

It  is  clear  that  if  we  have  an  efficient  procedure  to  es¬ 
tablish  zIP ,  we  can  immediately  decide  IP  by  setting 
Z  =  0.  On  the  other  hand,  to  be  able  to  establish 
the  converse  of  Theorem  1,  we  need  to  understand  the 
conditions  for  non- zIP,  and  so,  we  state  next  the  anal¬ 
ogous  of  Lemma  1  in  this  context. 

Lemma  2.  Let  X,Y,Z  be  disjoint  sets  of  variables, 
and  let  G  be  the  causal  diagram.  Px( y)  is  not  z- 
identifiable  in  G  if  there  exist  two  causal  models  M 1 
and  M 2  compatible  with  G  such  that  P1(V)  =  P2(V), 
P1(V\Z'|do(Z'))  =  P2(V\ Z'|do(Z')),  for  all  Z'  C  Z , 
and  Px(y)  j-  Px  (y)- 

Proof.  Let  /  be  the  set  of  interventional  distributions 
P(V\ Z'|do(Z')),  for  any  Z'  C  Z.  The  latter  inequality 
rules  out  the  existence  of  a  function  from  P,  /  to  Px( y). 

□ 

While  Lemma  2  might  appear  convoluted,  it  is  nothing 
more  than  a  formalization  of  the  statement  “Q  cannot 
be  computed  from  information  set  S  alone.”  Natu¬ 
rally,  when  S  has  two  components,  (P,  I)  ,  the  Lemma 
becomes  lengthy.  Even  though  the  problems  of  IP  and 
zIP  are  related,  Lemma  2  indicates  that  proofs  of  non- 
zIP  are  at  least  as  hard  as  the  ones  for  non-IP,  given 
that  to  prove  the  former  requires  the  construction  of 
two  models  to  agree  on  (P,  I),  while  to  prove  the  latter 
it  is  only  required  for  the  two  models  to  agree  on  the 
distribution  P. 

3  Characterizing  zW  Relations 

The  concept  of  confounded  component  (or  C- 
component)  was  introduced  in  [Tian  and  Pearl,  2002] 


to  represent  clusters  of  variables  connected  through 
bidirected  edges,  and  was  instrumental  in  establish¬ 
ing  a  number  of  conditions  for  ordinary  identification 
(Def.  1).  If  G  is  not  a  G-component  itself,  it  can  be 
uniquely  partitioned  into  a  set  C(G)  of  G-components. 
We  state  below  this  definition  that  will  also  play  a  key 
role  in  the  problem  of  zID.  3 

Definition  3  (C-component).  Let  G  be  a  causal  di¬ 
agram  such  that  a  subset  of  its  bidirected  arcs  forms 
a  spanning  tree  over  all  vertices  in  G.  Then  G  is  a 
C-component  (confounded  component). 

A  special  subset  of  C-components  that  embraces  the 
ancestral  set  of  Y  was  noted  by  [Shpitser  and  Pearl, 
2006]  to  play  an  important  role  in  deciding  identifi- 
ability  -  this  observation  can  also  be  applied  to  z- 
identifiability,  as  formulated  next. 

Definition  4  (C-forest).  Let  G  be  a  causal  diagram, 
where  Y  is  the  maximal  root  set.  Then  G  is  a  Y- 
rooted  C-forest  if  G  is  a  C-component  and  all  observ¬ 
able  nodes  have  at  most  one  child. 

We  next  introduce  a  structure  based  on  C-forests  that 
witnesses  unidentifiability  characterized  by  a  pair  of 
C-forests.  I'D  was  shown  by  [Shpitser  and  Pearl,  2006] 
infeasible  if  and  only  if  such  structure  exists  as  an  edge 
subgraph  of  the  given  causal  diagram. 

Definition  5  (hedge).  Let  X,Y  be  set  of  variables  in 
G.  Let  F,  F'  be  R -rooted  C-forests  such  that  FT ll  / 
0,f,'nI  =  0,F'C  F,  R  C  An( Y)G_.  Then  F  and 
F'  form  a  hedge  for  Px( Y)  in  G. 

The  presence  of  this  structure  will  prove  to  be  an  ob¬ 
stacle  to  z-identifiability  of  causal  effects  in  various 
scenarios.  For  instance,  the  p-graph  in  Fig.  1(b)  is  a 
Y-rooted  C-forest  in  which  Px(y)  will  show  not  to  be 
^-identifiable.  However,  different  than  in  the  ID  case, 
there  is  no  sharp  boundary  here,  since  Fig.  1(a)  also 
contains  a  Y-rooted  C-forest  but  Px(y)  was  already 
shown  to  be  zID. 

We  formally  show  next  that  there  is  a  variation  of  this 
structure  that  is  able  to  capture  non- zID  for  a  broad 
set  of  cases. 

Theorem  2.  Let  X,  Y,  Z  be  disjoint  sets  of  variables 
and  let  G  be  the  causal  diagram.  Then,  the  causal 
effects  Q  =  Px( y)  is  not  zID  if  there  exists  a  hedge 
7  =  (F,  F')  for  Q  in  %. 

Proof.  The  result  is  immediate.  The  existence  of  the 
hedge  7  for  Q  in  implies  that  Z  cannot  help  in  the 
(ordinary)  identification  of  Q.  Let  us  assume  that  Q 

3The  advent  of  C-components  complements  the  notion 
of  inducing  path ,  which  was  introduced  earlier  in  [Verma 
and  Pearl,  1990]. 
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Figure  2:  Graphs  in  which  P(y\x)  is  non- zID  from 
do(Z)  and  there  is  no  hedge  in  G^- 

is  zID.  Note  that  Z  does  not  participate  in  the  hedge 
7  since  there  is  no  bidirected  edge  going  towards  any 
of  its  elements  in  G^,  which  is  required  by  the  defini¬ 
tion  of  C-forest.  Further,  consider  a  parametrization 
such  that  all  elements  of  Z  are  simply  fair  coins  and 
disconnected  from  V  \  Z  in  G. 

We  can  now  use  the  same  proof  of  non-  ID  based  on  7 
to  prove  non- z ID  in  G.  The  inequality  of  Q  between 
the  two  models  is  obvious,  and  the  agreement  of  the  in¬ 
terventional  distributions  do( Z)  follows  since  Z  is  dis¬ 
connected  from  V  \  Z  by  the  chosen  parametrization. 
This  is  a  contradiction  since  zID  has  to  be  valid  for 
any  parametrization  compatible  with  G,  which  suffices 
to  prove  the  result.  □ 

Consider  the  next  Corollary  in  regard  to  the  p-graph, 
which  is  the  smallest  example  in  which  Z  could  aid  in 
the  ^-identification  of  Q  but  Q  is  still  not  ^-identifiable 
from  do(Z).  This  and  similar  structures  that  prevent 
zID  will  be  one  of  the  base  cases  for  our  proof  of  com¬ 
pleteness,  which  requires  a  demonstration  that  when¬ 
ever  the  algorithm  fails  to  ^-identify  a  causal  relation, 
the  relation  is  indeed  non-zID. 

Corollary  1.  Px(y)  is  not  zID  in  the  p-graph. 

Proof.  This  follows  directly  from  Theorem  2  since 
there  exists  a  hedge  in  G^-  □ 

The  result  of  Theorem  2  still  does  not  characterize  the 
zID  class,  which  suggests  that  the  machinery  used  to 
prove  completeness  in  the  ID  class  is  not  immediately 
applicable  to  the  zID  class. 

For  instance,  consider  the  graph  in  Fig.  2(a)  (called 
here  6u-graph),  which  does  not  have  a  hedge  for  Q  in 
G-g  but  is  still  non  -zID.  The  6u-graph  coincides  as 
an  edge  subgraph  with  Fig.  1(a)  (note  C-component 
induced  over  {X,  Y,  Zj),  which  turns  out  to  be  zID. 

This  is  an  interesting  case,  since  up  to  this  point,  in  or¬ 
dinary  identification,  it  was  enough  to  locate  a  hedge 
for  Q  as  an  edge  subgraph  of  the  inputted  diagram, 
and  all  graphs  sharing  this  substructure  were  equally 


Figure  3:  P(y\x)  is  zID  from  ( P,do(Z )}  in  the  graphs  in  the  first  row  (a-d),  but  not  in  the  the  second  row  (e-h). 


unidentifiable  (see  Thm.  4  in  [Shpitser  and  Pearl, 
2006])  -  this  is  no  longer  true  here  since  Z  needs  to 
be  taken  into  account.  Mainly,  note  that  the  directed 
edges  outside  a  C-component  play  a  very  critical  role 
for  the  zIP  problem  as  the  fre-graph  demonstrates. 

Finally,  we  expand  Pearl’s  condition  [Pearl,  2000,  pp. 
87]  in  the  following  directions.  We  extend,  in  the  in¬ 
tuitive  way,  his  condition  to  consider  when  Z  is  a  set 
of  variables  and,  in  turn,  we  supplement  the  sufficient 
part  with  its  necessary  counterpart.  We  finally  have  a 
complete  characterization  for  the  zIP  class  as  shown 
below. 

Theorem  3.  Let  X,  Y,  Z  be  disjoint  sets  of  variables 
and  let  G  be  the  causal  diagram.  The  causal  effect 
Q  =  P(y|do(x))  is  zIP  in  G  if  and  only  if  one  of  the 
following  conditions  hold: 

a.  Q  is  identifiable  in  G;  or, 

b.  There  exists  Z'  C  Z  such  that  the  following  condi¬ 
tions  hold, 

(i)  X  intercepts  all  directed  paths  from  Z'  to  Y, 
and 

(ii)  Q  is  identifiable  in  G^rf 

Proof.  See  Appendix.  □ 

Let  Q  =  P{y\x)  be  the  effect  of  interest  and  assume 
that  experiments  were  performed  over  { Z }.  Q  is  zIP 
from  P  and  do(Z)  in  the  graphs  in  Fig.  3(a-d),  while 
they  are  non-zIP  in  the  graphs  in  Fig.  3(e-h).  Ex¬ 
cept  for  the  trivial  case,  Theorem  3  is  existentially 


quantified  and  it  is  not  immediately  obvious  how  to 
efficiently  select  the  covariates  simultaneously  satisfy¬ 
ing  both  conditions  of  the  Theorem.  Clearly,  a  naive 
approach  could  lead  to  an  exponential  number  of  tests. 

For  example,  consider  the  graph  in  Fig.  3(a)  that  is 
a  variation  of  the  &u-graph.  In  this  graph,  Q  is  zIP 
using  experiments  from  {Z}.  In  turn,  consider  the 
graph  in  Fig.  3(e),  which  is  the  same  as  3(a)  but  with 

the  bidirected  edge  W  < - >  X  added.  Now,  Q  is  no 

longer  zIP  for  {Zj  nor  {Z,W}.  If  we  further  consider 

the  graph  in  Fig.  3(b)  with  the  bidirected  edge  W  < - > 

X  removed  from  3(e),  not  only  Q  becomes  zIP  for 
{Z}  but  also  for  {Z,W}.  This  is  a  border  case,  note 
that  if  we  input  {Z,W}  as  the  surrogate  variables  for 
Pearl’s  criterion,  it  will  not  be  able  to  recognize  Q  as 
zID  given  the  existence  of  the  directed  path  W  — >  Y . 
Finally,  if  we  consider  the  graph  in  Fig.  3(f)  in  which 
the  directed  edge  W  — *  Z  is  flipped  from  3(b),  Q  is  no 
longer  zIP  for  neither  {Z,W}  nor  {Z}. 

This  example  can  be  extended  indefinitely  but  it  is 
clear  that  finding  a  set  that  satisfies  both  conditions 
of  the  Theorem,  in  structures  more  intricate  than  the 
given  4-node  example,  does  not  follow  immediately. 
The  subject  of  the  next  section  is  about  finding  an  ef¬ 
ficient  (and  complete)  algorithm  to  solve  this  problem. 

But  for  now,  consider  the  following  Lemma  that  con¬ 
firms  our  intuition  that  surrogate  experiments  should 
not  disturb  the  causal  paths  (non-descendents)  of  the 
variables  that  are  being  analyzed. 

4This  condition  can  be  rephrased  graphically  as  “There 
exists  no  hedge  for  Q  as  an  edge  subgraph  in  G^r.” 


Corollary  2.  Let  G  be  the  causal  diagram,  X,  Y  C  V 
be  disjoint  sets  of  variables,  and  Z  C  De(X.)aAnm  ■ 
The  causal  effect  Q  =  P(y|do(x))  is  not  zID  from  P 
and  do( Z)  in  G,  if  Q  is  not  ID  from  P  in  G. 

Proof.  The  result  follows  directly  from  Theorem  3.  □ 

4  A  Complete  Algorithm  for  zl‘. D 

In  this  section,  we  propose  a  simple  extension  of  the 
ordinary  identification  algorithms  to  solve  the  problem 
of  z-identifiability,  which  we  call  IDZ  (see  Fig.  4). 

We  build  on  previous  analysis  of  identifiability  given 
in  [Pearl,  1995;  Kuroki  and  Miyakawa,  1999;  Tian  and 
Pearl,  2002;  Shpitser  and  Pearl,  2006;  Huang  and  Val- 
torta,  2006],  and  we  choose  to  start  with  the  version 
provided  by  Shpitser  (called  ID)  since  the  hedge  struc¬ 
ture  is  explicitly  employed,  which  will  show  to  be  in¬ 
strumental  to  prove  completeness. 

Before  considering  the  technical  results,  we  explain  our 
strategy  and  how  our  version  of  the  algorithm  relates 
to  the  existent  ones  for  ordinary  identifiability. 

(i)  z-ident, ifiability  (sufficiency):  Causal  relations 
can  be  solved  in  our  context  through  ordinary  iden¬ 
tifiability  or  identifiability  relying  on  the  experiments 
performed  over  Z.  The  current  algorithms  already  op¬ 
erate  on  the  first  part,  and  they  proceed  exploring  a 
sequence  of  equalities  in  do-calculus  based  on  the  C- 
component  decomposition.  (The  idea  is  to  apply  a 
divide-and-conquer  strategy  breaking  the  problem  into 
smaller,  more  manageable  pieces,  and  then  to  assemble 
them  back  when  it  is  possible.)  It  turns  out  that  the 
equalities  used  by  the  algorithm  are  all  in  the  inter¬ 
ventional  space  (between  interventional  distributions 
except  for  the  base  cases),  which  is  attractive  for  the 
zID  problem  since  certain  interventional  distributions 
Z  are  already  available  to  use. 

For  instance,  when  steps  3  or  4  succeed  in  their  tests 
and,  at  the  same  time,  have  non-empty  intersection 
with  Z,  we  exploit  the  common  variables,  updating 
the  graph  and  respective  data  structures  accordingly. 
We  then  continue  solving  an  ordinary  ID  instance  but 
no  longer  have  to  identify  these  variables  and  they  pos¬ 
sibly  can  help  in  the  identifiability  of  others. 

(ii)  No n- z- i do nt i f i a b i  1  i t y  (necessity):  The  algo¬ 
rithm  proceeds  until  it  is  not  able  to  resolve  a  certain 
subproblem,  which  implies  the  existence  of  a  certain 
hedge.  Note  that  the  given  hedge  can  be  different  than 
the  one  used  for  ID  in  the  same  graph  since  the  experi¬ 
ments  over  Z  possibly  destroyed  the  original  ones.  Fur¬ 
ther,  note  that  to  use  the  given  hedge  to  prove  non- zID 
is  not  immediate  since,  in  the  light  of  Lemma  2,  more 
constraints  need  to  be  satisfied  in  order  to  support 


function  IDz(y,  x,  Z,X,  J ,  P ,  G ) 

INPUT:  x,y:  value  assignments;  Z:  variables  with 
interventions  available;  X,  J :  see  caption;  P:  current 
probability  distribution  do(T,  J ,  x)  (observational 
when  X  =  J  =  0);  G:  causal  graph. 

OUTPUT:  Expression  for  Px( y)  in  terms  of  P,  Pz  or 
FAIL(F,  F'). 

1  if  x  =  0,  return  Yhv\yP{w)- 

2  if  V\An(Y)G^0, 

return  IDz(y,  x  fl  An{ Y)G,  Z, 

W.£v\AnCY)0-P^«(Y)G). 

3  Set  Zw  =  ((V  \  (X  U  X  U  J))  \  Ar(Y)Gxuw)  n  Z. 
Set  W  =  ((V  \  (X  U  X  U  J))  \  An( Y)Gxdw)  \  Z. 
if  (ZwUW)/0, 

return  IDz(y,  x  U  w,  Z\Zw,XUzw,J,P,G). 

4  ifC(G?\(XUXUj))  =  {50,51,>..,5fc}, 
return  Ev\{y.x.z}  IL  IE)Z0i.  (v  \  «*)  \  z> 

Z  \  (V  \  Si),T,  J  U  (Z  D  (v  \  Si)),  P,  G). 
if  C(G\(XUXU.7))  =  {£}, 

5  if  C(G)  =  {G},  FAIL(G,  5). 

6  if  S  €  C(G), 

return  Es\y  IL|v;eS  P(vi\vG~1]  \  (J  u  J))- 

7  if  ( 3S')S  cS'e  C(G), 
return  IDz(y,x  fl  S',  Z,  X,  J , 

n*|vieG'  P{Vi \vt1}  n  S'.v^  \  (S'UXU 

Figure  4:  IDZ:  Modified  version  of  ID  algorithm  ca¬ 
pable  of  recognizing  zID ;  The  variables  X,  J  represent 
indices  for  currently  active  ^-interventions  introduced 
respectively  by  steps  3  or  4.  Note  that  P  is  sensitive 
to  current  instantiations  of  X,  J . 

such  claim.  Still,  it  is  clear  that  if  Z  is  not  involved 
in  the  hedge,  it  can  be  shown  that  the  two  problems 
coincide.  The  other  cases  in  which  Z  has  non-empty 
intersection  with  the  hedge  have  to  be  handled  more 
carefully. 

Note  that  the  key  difference  between  IDZ  and  the  orig¬ 
inal  ID  implementation  is  in  steps  3  and  4  in  which 
possibly  some  Z'  C  Z  is  added  as  an  interventional 
set,  and  kept  as  so  until  the  end  of  the  execution.  It 
is  clear  that  these  additions  just  can  represent  a  bene¬ 
fit  in  computing  the  target  Q  since  is  always  easier  to 
identify  a  quantity  in  a  subgraph  of  the  original  input. 

We  prove  next  soundness  and  completeness  of  IDZ. 

Theorem  4  (soundness).  Whenever  IDZ  returns  an 
expression  for  Px( y),  it  is  correct. 

Proof.  The  result  is  immediate  since  the  soundness  of 
ID  was  already  established  [Shpitser  and  Pearl,  2006, 
Thm.  5],  which  is  inherited  by  IDZ  by  construction. 
Note  that  adding  Z'  C  Z  as  an  interventional  set  and 


not  trying  to  “identify”  it  later  does  not  represent  a 
problem,  in  the  zFD  sense,  since  by  assumption  we  can 
use  the  interventional  distributions  do(  Z)  in  the  final 
expression  returned  by  the  procedure.  □ 

Theorem  5.  Assume  IDZ  fails  to  z-identify  Px( y) 
from  P  and  do{ Z)  in  G  (executes  line  5).  Then  there 
exists  X'  C  X,  Y'  C  Y,  Z',Z"  C  Z  such  that  the 
graph  pair  G ,  S  returned  by  the  fail  condition  of  IDZ 
contain  as  edge  subgraphs  C-forests  F,  F'  that  form  a 
hedge  for  Px',z'(y'-Z")- 

Proof.  This  property  is  just  partly  inherited  from  the 
original  ID  since  we  can  add  Z'  C  Z  as  interventional 
nodes  along  the  execution  of  IDZ;  we  also  keep  track  of 
Z"  C  Z  that  are  related  to  An(Y)  during  the  execution 
of  the  procedure  (to  be  specified  below). 

Consider  G,  Yf,  X  and  J  local  to  the  call  in  which 
IDZ  exited  with  failure  (line  5).  It  is  true  that  the 
set  Yf  is  such  that  Z"  =  Yf  fl  Z  and  Y'  =  Yf  fl  Y. 
Let  Z'  C  Z  be  the  active  part  of  Z  in  the  faulty  call, 
which  we  kept  track  through  XU  J .  The  condition  that 
triggered  failure  is  that  the  whole  graph  was  a  single 
C-component.  Let  R  be  the  root  set  of  G.  We  can 
remove  a  set  of  directed  arrows  while  keeping  the  root 
R  such  that  the  resulting  F  is  an  R-rooted  C-forest. 

Similarly  to  ID,  note  that  since  F'  =  F  n  S  is  closed 
under  descendent  and  only  single  directed  arrows  were 
removed  from  S  to  obtain  F',  F'  is  also  a  C-forest. 
Now,  F'nfXU  Z')  =  0  and  Ffl(XU  Z')  ±  0,  by 
construction.  Also,  R  C  An( Y',  Z")gx  and  Z"  C 
A?r(Y)(3_7,  by  line  2  and  3  of  the  algorithm.  □ 

Theorem  6  (completeness).  IDZ  is  complete. 

Proof.  By  Theorem  5,  IDZ  failure  implies  the  exis¬ 
tence  ofX'CX,  Y'CY,Z',Z"CZ,  and  C-forests 
F,  F'  that  form  a  hedge  for  Px/,z/(y',  z").  Let  us  pro¬ 
ceed  our  analysis  by  cases: 

Case  Z'  =  0,  Z"  =  0.  The  construction  provided  by 
[Shpitser  and  Pearl,  2006,  Corollary  2]  can  be  used 
here  since  this  case  reduces  to  ordinary  identifiability. 

Case  Z'  =  0,  Z"  ^  0.  Even  though  Z"  is  in  the  root 
set  of  the  hedge,  and  not  related  to  the  interventional 
part  ( F\F ')  where  the  asymmetry  in  the  construction 
usually  resides  (to  generate  inequality  in  Q),  the  previ¬ 
ous  construction  have  to  be  used  with  certain  caution, 
as  given  by  case  1  of  Thm.  3. 

There  is  an  interesting  border  subcase  when  Y'  = 
0.  We  need  to  keep  track  of  {X,  J'}  since  if  the  Z- 
interventions  are  added  in  step  3,  we  should  not  be 
concerned  with  summing  over  the  assignments  of  the 
variables  added,  but  if  the  Z-interventions  are  added  in 


step  4,  we  do  have  to  take  care  of  this  case.  Note  that 
we  would  have  some  hedge  in  a  do-equality  in  the  form 
Q  =  Ez"-px'(z")/(x>y>-)>  in  which  if  /(.)  is  iden¬ 
tifiable  and  uniformly  distributed,  Q  would  equate  in 
both  models  and  spoil  the  counter-example.  The  prob¬ 
lem  is  not  difficult  to  fix,  and  we  just  have  to  create  a 
map  for  /()  that  is  non-uniform.  (See  Thm.  3.) 

Case  7J  ^  0,  Z"  =  0.  The  construction  provided  in 
cases  2  and  3  of  Thm.  3  were  more  involved  since 
it  was  not  know  a  priori  which  C-factor  yielded  the 
“faulty”  call.  In  the  IDZ  case,  we  already  located  the 
hedge  based  on  the  trace  of  the  algorithm,  then  we  can 
essentially  use  the  same  construction  of  these  cases  to 
provide  a  counterexample. 

Case  7J  ^  0,  Z"  ^  0.  The  construction  provided  in 
the  two  previous  cases  are  not  incompatible,  and  they 
can  be  combined  to  provide  a  counter-example  to  this 
scenario. 

Moreover,  the  previous  constructions  were  given  over 
the  subgraph  U  of  G,  and  how  to  extend  the  counter¬ 
example  to  G  is  discussed  in  Theorem  3.  □ 

Corollary  3.  The  rules  of  do-calculus,  together  with 
standard  probability  manipulations  are  complete  for 
determining  z-identifiability  of  Px( y). 

Proof.  It  was  already  shown  [Shpitser  and  Pearl,  2006, 
Thm.  7]  that  the  operations  of  ID  correspond  to  se¬ 
quences  of  standard  probability  manipulations  and  ap¬ 
plication  of  the  rules  of  do-calculus,  which  is  also  true 
by  construction  for  IDZ,  and  so  the  result  follows.  □ 

Conclusion 

This  paper  was  concerned  with  a  variation  of  the  iden¬ 
tifiability  problem  in  which  experiments  can  be  con¬ 
ducted  over  a  subset  of  the  variables  Z  in  addition  to 
the  assumptions  embodied  in  a  causal  digram  G  and 
the  statistical  knowledge  given  as  a  probability  dis¬ 
tribution.  (If  Z  is  an  empty  set,  the  two  problems 
coincide.) 

We  provide  a  graphical  necessary  and  sufficient  condi¬ 
tion  for  the  cases  when  the  causal  effect  of  an  arbitrary 
set  of  variables  on  another  arbitrary  set  can  be  deter¬ 
mined  uniquely  from  the  available  information.  We 
further  provide  a  complete  algorithm  for  computing 
the  resulting  mapping,  that  is,  a  formula  fusing  avail¬ 
able  observational  and  experimental  data  to  synthesize 
an  estimate  of  the  desired  causal  effects.  Furthermore, 
we  use  our  results  to  prove  completeness  of  do-calculus 
in  respect  to  the  z-identifiability  class. 

Our  results  were  developed  in  a  non-parametric  set¬ 
ting  in  the  tradition  of  the  do-calculus.  For  a  future 


research  direction,  it  would  be  interesting  to  explore 
how  experimental  data  can  aid  the  identification  in  the 
linear  case.  This  is  a  harder  problem,  since  a  complete 
characterization  of  ordinary  identifiability  (i.e. ,  Z  =  0) 
in  the  linear  case  is  still  an  open  problem. 

This  paper  complements  two  recent  works  on  gener- 
alizability  of  causal  and  statistical  knowledge.  The 
first,  dubbed  “transportability”  [Pearl  and  Barein- 
boim,  2011;  Bareinboim  and  Pearl,  2012b]  ,  deals  with 
transferring  causal  information  from  an  experimental 
to  an  observational  environment,  potentially  different 
from  the  first.  The  second,  called  “selection  bias” 
[Bareinboim  and  Pearl,  2012c],  deals  with  extrapo¬ 
lation  between  an  environment  in  which  samples  are 
selected  preferentially  and  one  in  which  no  preferen¬ 
tial  sampling  takes  place.  The  extrapolation  involved 
in  z-Identification  problems  takes  place  between  two 
different  regimes;  one  in  which  experiments  are  per¬ 
formed  over  Z,  and  one  in  which  future  experiments 
are  anticipated  over  X.  Extensions  to  “meta  syn¬ 
thesis”  tasks,  where  information  from  multiple  het¬ 
erogeneous  sources  are  combined  to  increase  the  ef¬ 
fective  sample  size,  are  considered  in  [Pearl,  2012b; 
2012a]. 
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